inverse stability
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > China (0.04)
- Europe > Austria > Vienna (0.15)
- North America > United States (0.14)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
How degenerate is the parametrization of neural networks with the ReLU activation function?
Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal.
- Europe > Austria > Vienna (0.15)
- North America > United States (0.14)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- North America > United States > Arizona > Maricopa County > Scottsdale (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > China (0.04)
Reviews: How degenerate is the parametrization of neural networks with the ReLU activation function?
I read the author response and other reviews. The author response provides nice additional demonstration about the implication of connecting the two problems via inverse stability. This is an interesting and potentially important paper for a future research on this topic. This paper explains the definition of the inverse stability, proves its implication for neural network optimization, provides failure modes of having the inverse stability, and proves the inverse stability for a simple one-hidden layer network with a single output. Originality: The paper definitely provides a very interesting and unique research direction.
How degenerate is the parametrization of neural networks with the ReLU activation function?
Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal.
How degenerate is the parametrization of neural networks with the ReLU activation function?
Elbrächter, Dennis Maximilian, Berner, Julius, Grohs, Philipp
Neural network training is usually accomplished by solving a non-convex optimization problem using stochastic gradient descent. Although one optimizes over the networks parameters, the main loss function generally only depends on the realization of the neural network, i.e. the function it computes. Studying the optimization problem over the space of realizations opens up new ways to understand neural network training. In particular, usual loss functions like mean squared error and categorical cross entropy are convex on spaces of neural network realizations, which themselves are non-convex. Approximation capabilities of neural networks can be used to deal with the latter non-convexity, which allows us to establish that for sufficiently large networks local minima of a regularized optimization problem on the realization space are almost optimal.